Context-Specific and Multi-Prototype Character Representations
نویسندگان
چکیده
Unsupervised word representations have demonstrated improvements in predictive generalization on various NLP tasks. Much effort has been devoted to effectively learning word embeddings, but little attention has been given to distributed character representations, although such character-level representations could be very useful for a variety of NLP applications in intrinsically “character-based” languages (e.g. Chinese and Japanese). On the other hand, most of existing models create a singleprototype representation per word, which is problematic because many words are in fact polysemous, and a single-prototype model is incapable of capturing phenomena of homonymy and polysemy. We present a neural network architecture to jointly learn character embeddings and induce context representations from large data sets. The explicitly produced context representations are further used to learn context-specific and multipleprototype character embeddings, particularly capturing their polysemous variants. Our character embeddings were evaluated on three NLP tasks of character similarity, word segmentation and named entity recognition, and the experimental results demonstrated the proposed method outperformed other competing ones on all the three tasks.
منابع مشابه
Groups with Two Extreme Character Degrees and their Minimal Faithful Representations
for a finite group G, we denote by p(G) the minimal degree of faithful permutation representations of G, and denote by c(G), the minimal degree of faithful representation of G by quasi-permutation matrices over the complex field C. In this paper we will assume that, G is a p-group of exponent p and class 2, where p is prime and cd(G) = {1, |G : Z(G)|^1/2}. Then we will s...
متن کاملTHE RATIONAL CHARACTER TABLE OF SPECIAL LINEAR GROUPS
In this paper we will give the character table of the irreducible rational representations of G=SL (2, q) where q= , p prime, n>O, by using the character table and the Schur indices of SL(2,q).
متن کاملMulti-prototype Chinese Character Embedding
Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-pro...
متن کاملCharacters in Search of an Author:
In this paper, we present the first results obtained with an interactive storytelling prototype. Our main objective is to develop flexible character-based systems, which nevertheless rely on narrative formalisms and representations. Characters’ behaviours are generated from plan-based representations, whose content is derived from narrative formalisms. We suggest that search based planning can ...
متن کاملCorpus-level Fine-grained Entity Typing
This paper addresses the problem of corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class such as “food” or “artist”. The application of entity typing we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embeddingbased and combines (i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016